NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MLRegTest: A benchmark for the machine learning of regular languages [DATASET]

https://doi.org/10.5061/dryad.dncjsxm4h

Van_Der_Poel, Sam; Lambert, Dakotah; Kostyszyn, Kalina; Gao, Tiantian; Verma, Rahul; Andersen, Derek; Chau, Joanne; Peterson, Emily; St_Clair, Cody; Fodor, Paul; et al (January 2023, Dryad)

MLRegTest is a benchmark for machine learning systems on sequence classification, which contains training, development, and test sets from 1,800 regular languages. MLRegTest organizes its languages according to their logical complexity (monadic second order, first order, propositional, or monomial expressions) and the kind of logical literals (string, tier-string, subsequence, or combinations thereof). The logical complexity and choice of literal provides a systematic way to understand different kinds of long-distance dependencies in regular languages, and therefore to understand the capacities of different ML systems to learn such long-distance dependencies.
more » « less
Querying Knowledge via Multi-Hop English Questions

Gao, Tiantian; Fodor, Paul; Kifer, Michael (September 2019, ICLP 2019)

The inherent difficulty of knowledge specification and the lack of trained specialists are some of the key obstacles on the way to making intelligent systems based on the knowledge representation and reasoning (KRR) paradigm commonplace. Knowledge and query authoring using natural language, especially controlled natural language (CNL), is one of the promising approaches that could enable domain experts, who are not trained logicians, to both create formal knowledge and query it. In previous work, we introduced the KALM system (Knowledge Authoring Logic Machine) that supports knowledge authoring (and sim- ple querying) with very high accuracy that at present is unachievable via machine learning approaches. The present paper expands on the question answering aspect of KALM and introduces KALM-QA (KALM for Question Answering) that is capable of answering much more complex English questions. We show that KALM-QA achieves 100% accuracy on an extensive suite of movie-related questions, called MetaQA, which contains almost 29,000 test questions and over 260,000 training questions. We contrast this with a published machine learning approach, which falls far short of this high mark.
more » « less
Full Text Available
High accuracy question answering via hybrid controlled natural language

Gao, Tiantian; Fodor, Paul; Kifer, Michael (December 2018, IEEE/WIC/ACM International Conference on Web Intelligence (WI2018))

Knowledge representation and reasoning (KRR) is key to the vision of the intelligent Web. Unfortunately, wide deployment of KRR is hindered by the difficulty in specifying the requisite knowledge, which requires skills that most domain experts lack. A way around this problem could be to acquire knowledge automatically from documents. The difficulty is that, KRR requires high-precision knowledge and is sensitive even to small amounts of errors. Although most automatic information extraction systems developed for general text understandings have achieved remarkable results, their accuracy is still woefully inadequate for logical reasoning. A promising alternative is to ask the domain experts to author knowledge in Controlled Natural Language (CNL). Nonetheless, the quality of knowledge construc- tion even through CNL is still grossly inadequate, the main obstacle being the multiplicity of ways the same information can be described even in a controlled language. Our previous work addressed the problem of high accuracy knowledge authoring for KRR from CNL documents by introducing the Knowledge Au- thoring Logic Machine (KALM). This paper develops the query aspect of KALM with the aim of getting high precision answers to CNL questions against previously authored knowledge and is tolerant to linguistic variations in the queries. To make queries more expressive and easier to formulate, we propose a hybrid CNL, i.e., a CNL with elements borrowed from formal query languages. We show that KALM achieves superior accuracy in semantic parsing of such queries.
more » « less
Full Text Available
Knowledge authoring for rule-based reasoning

Gao, Tiantian; Fodor, Paul; Kifer, Michael (October 2018, Ontologies, DataBases, and Applications of Semantics)

Modern knowledge bases have matured to the extent of being capable of complex reasoning at scale. Unfortunately, wide deployment of this technology is still hindered by the fact that specifying the requisite knowledge requires skills that most domain experts do not have, and skilled knowledge engineers are in short supply. A way around this problem could be to acquire knowledge from text. However, the current knowledge acquisition technologies for information extraction are not up to the task because logic reasoning systems are extremely sensitive to errors in the acquired knowledge, and existing techniques lack the required accuracy by too large of a margin. Because of the enormous complexity of the problem, controlled natural languages (CNLs) were proposed in the past, but even they lack high enough accuracy. Instead of tackling the general problem of text understanding, our interest is in a related, but different, area of knowledge authoring—a technology designed to enable domain experts to manually create formalized knowledge using CNL. Our approach adopts and formalizes the FrameNet methodology for representing the meaning, enables incrementally-learnable and explainable semantic parsing, and harnesses rich knowledge graphs like BabelNet in the quest to obtain unique, disambiguated meaning of CNL sentences. Our experiments show that this approach is 95.6% accurate in standardizing the semantic relations extracted from CNL sentences—far superior to alternative systems.
more » « less
Full Text Available
Knowledge authoring for rule-based reasoning

Gao, Tiantian; Fodor, Paul; Kifer, Michael (October 2018, The 17th International Conference on Ontologies, DataBases, and Applications of Semantics)

Modern knowledge bases have matured to the extent of being capable of complex reasoning at scale. Unfortunately, wide deployment of this technology is still hindered by the fact that specifying the req- uisite knowledge requires skills that most domain experts do not have, and skilled knowledge engineers are in short supply. A way around this problem could be to acquire knowledge from text. However, the current knowledge acquisition technologies for information extraction are not up to the task because logic reasoning systems are extremely sensitive to er- rors in the acquired knowledge, and existing techniques lack the required accuracy by too large of a margin. Because of the enormous complexity of the problem, controlled natural languages (CNLs) were proposed in the past, but even they lack high enough accuracy. Instead of tackling the general problem of text understanding, our interest is in a related, but different, area of knowledge authoring—a technology designed to enable domain experts to manually create formalized knowledge using CNL. Our approach adopts and formalizes the FrameNet methodology for rep- resenting the meaning, enables incrementally-learnable and explainable semantic parsing, and harnesses rich knowledge graphs like BabelNet in the quest to obtain unique, disambiguated meaning of CNL sentences. Our experiments show that this approach is 95.6% accurate in standard- izing the semantic relations extracted from CNL sentences—far superior to alternative systems.
more » « less
Full Text Available

Search for: All records